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Abstract 

We present an exact and complete algorithm to isolate the real solutions of a zero-dimensional bi- 
variate polynomial system. The proposed algorithm constitutes an elimination method which improves 
upon existing approaches in a number of points. First, the amount of purely symbolic operations is 
significantly reduced, that is, only resultant computation and square-free factorization is still needed. 
Second, our algorithm neither assumes generic position of the input system nor demands for any change 
of the coordinate system. The latter is due to a novel inclusion predicate to certify that a certain region 
is isolating for a solution. Our implementation exploits graphics hardware to expedite the resultant 
computation. Furthermore, we integrate a number of filtering techniques to improve the overall per- 
formance. Efficiency of the proposed method is proven by a comparison of our implementation with 
two state-of-the-art implementations, that is, Lgp and Maple's Isolate. For a series of challenging 
benchmark instances, experiments show that our implementation outperforms both contestants. 

1 Introduction 

Finding the real solutions of a bivariate polynomial system is a fundamental problem with numerous 
applications in computational geometry, computer graphics and computer aided geometric design. In 
particular, topology and arrangement computations for algebraic curves [HI [HJ El [20] crucially rely on 
the computation of common intersection points of the given curves (and also the curves defined by their 
partial derivatives). For the design of robust and certified algorithms, we aim for exact methods to de- 
termine isolating regions for all solutions. Such methods should be capable of handling any input, that 
is, even systems with multiple solutions. The proposed algorithm Bisolve constitutes such an exact and 
complete approach. Its input is a zero- dimensional (i.e., there exist only finitely many solutions) polyno- 
mial system f(x, y) = g(x, y) = defined by two bivariate polynomials with integer coefficients. Bisolve 
computes disjoint boxes £?i, . . . , B m C M 2 for all real solutions, where each box Bi contains exactly one 
solution (i.e., Bi is isolating). In addition, the boxes can be refined to an arbitrary small size. 

Main results. Bisolve constitutes a classical elimination method which follows the same basic idea as 
the GRID method from [12] and the hybrid method proposed in |21| . More precisely, in a first step, the 
variables x and y are separately eliminated by means of a resultant computation. Then, in the second 
step, for each possible candidate (represented as pair of projected solutions in x- and y-direction), we 
check whether it actually constitutes a solution of the given system or not. The proposed method comes 
with a number of improvements compared to the aforementioned approaches and also to other existing 
elimination techniques |2"5| |29[ 113] . First, we tremendously reduced the amount of purely symbolic 
computations, namely, our method only demands for resultant computation and square-free factorization 
of univariate polynomials with integer coefficients. Second, our implementation profits from a novel 
approach [T71 EE] to compute resultants exploiting the power of Graphics Processing Unite (GPUs). We 
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remark that, in comparison to the classical resultant computation on the CPU, the GPU implementation 
is typically more than 100-times faster. Our experiments show that, for the considered instances, the 
resultant computation is no longer a "global" bottleneck of an elimination approach. Third, the proposed 
method never uses any kind of a coordinate transformation, even for non-generic inputQ The latter is 
due to a novel inclusion predicate which combines information from the resultant computation and a 
homotopy argument to prove that a certain candidate box is isolating for a solution. Since we never apply 
a change of coordinates, our method particularly profits in the case where / and g are sparse or where we 
are only interested in "local" solutions within a given box. Finally, we integrated a series of additional 
filtering techniques which allow us to significantly speed up the computation for many instances. 

We implemented our algorithm as a prototypical package of Cgal [36J and ran our software on nu- 
merous challenging benchmark instances. For comparison, we considered two currently state-of-the-art 
implementations, that is, Isolate (based on Rs by Fabrice Rouillier with ideas from [29J) from Maple 13 
and Lgp by Xiao-Shan Gao et al. [8]. Our experiments show that our method is efficient as it outper- 
forms both contestants for most instances. More precisely, our method is comparable for all considered 
instances and typically between 5 and 10-times faster. For some instances, we even improve by a factor of 
50 and more. Our filters apply to many input systems and crucially contribute to the overall performance. 
We further remark that the gain in performance is not solely due to the resultant computation on the 
GPU but rather due to the combination of the sparse use of purely symbolic computations and efficient 
(approximate) subroutines. We prove the latter fact by providing running times with and without fast 
GPU-resultant computation. 

Related Work. Since polynomial root solving is such an important problem in several fields, plenty of 
distinct approaches exist and many textbooks are dedicated to this subject. In general, we distinguish 
between two kinds of methods. 

The first comprises non-certified or non-complete methods which give, in contrast to our goal here, 
no guarantee on correctness or termination (e.g., if multiple roots exists). Representatives of this cat- 
egory are numerical (e.g. homotopy methods [33]) or subdivision method^] (e.g., [271 El Ej ) • A major 
strength of these methods is that they are very efficient for most instances due to their use of approximate 
computations such as provided by IntBis, ALIAS, IntLab or MPFI. 

The second category consists of certified and complete methods, to which ours is to be added. So far, 
only elimination methods based on (sparse) resultants, rational univariate representation, Groebner bases 
or eigenvalues have proven to be efficient representatives of this category; see, for instance, [2"Sl [TTj [33 1 157] 
for introductions to such symbolic approaches. Common to all these methods is that they combine a 
projection and a lifting step similar to the proposed approach. Recent exact and complete implemen- 
tations for computing the topology of algebraic curves and surfaces [HI [201 E] also make use of such 
elimination techniques. However, already this low dimensional application shows the main drawback of 
elimination methods, that is, they tremendously suffer from costly symbolic computations. Furthermore, 
the given system might be in non-generic position which makes the lifting step non-trivial. In such "hard 
situations" , the existing approaches perform a coordinate transformation (or project in generic direction) 
which eventually increases the complexity of the input polynomials. In particular, if we are only interested 
in "local" solutions within a given box, such methods induce a huge overhead of purely symbolic com- 
putations. The proposed algorithm constitutes a contribution in two respects: The number of symbolic 
steps are crucially reduced and partially (resultant computation) outsourced to the GPU. In addition, 
generic and non-generic situations are treated in the same manner and, thus, a coordinate transformation 
which induces an overhead of symbolic computations is no longer needed. 

The system / = g — is non-generic if there exist two solutions sharing a common coordinate. 
2 Subdivision methods can be made certifying and complete when considering worst case separation bounds for the 
solutions, an approach which has not shown effective in practice so far. 
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2 Setting 



The input of our algorithm is the following polynomial system 

f(x,y) = Yl /<>'V 9(x,y)= Yl "'r' J U J 

i,jEN'.i+j<m i,j£N:i+j<n 

where f,g£ 7L\x,y\ are polynomials of total degrees m and n, respectively. We also write 

m x rn y n x n y 

f{ x ,y) = Yf ( i X \y)x % = Yfl y) ^)y l and 9{x,y) = Y,9 l t\y)^ = Y.^ ) ^)y\ 

i=0 i=0 i=0 i=0 

where f- y \ G , gf G Z[y] and m x , n x and m y , n y denote the degrees of / and g considered 

as polynomials in x and y, respectively. Throughout the paper, it is assumed that / and g have no 
common factors]^] Hence, the set Vc := {{x,y) G C 2 \f(x,y) = g(x,y) = 0} of (complex) solutions of (2.1) 
is zero-dimensional and consists, by Bezout's theorem, of at most m ■ n distinct elements. 

Our algorithm outputs disjoint boxes C M 2 such that the union of all contains all real solutions 

Vm := {(x,y) G R 2 \f(x,y) = g(x,y) = 0} = V c nR 2 



of (2.1) and each Bk is isolating, that is, it contains exactly one solution. 

Notation. For an interval / = (a, b) C M, ra; := (a + b)/2 denotes the center and rj := (6 — a)/2 the 
radius of /. For an arbitrary m G C and r G M + , A r (m) denotes the disc with center m and radius r. 



3 The Algorithm 



3.1 Resultants 

Our algorithm is based on well known elimination techniques, namely, to consider the projections 



V, 



{x G C\3y G C with f{x, y) = g(x, y) = 0}, := {y G C\3x G C with /(x, y) = g(x, y) = 0} 

of all complex solutions Vc onto the x- and y-coordinate. Resultant computation is a well studied tool to 
obtain an algebraic description of these projection sets, that is, polynomials whose roots are exactly the 
projections of the solution set Vc- The resultant = res(/, g, y) of / and g with respect to the variable 
y is the determinant of the (m y + n y ) x (m y + n y ) Sylvester matrix: 



S {y) (f,g) 
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From the definition, it follows that B> y > G Z[x] has degree less than or equal to m ■ n. The resultant 
= res(f,g,x) of / and 5 with respect to x is defined in completely analogous manner by considering 
/ and g as polynomials in x instead of y. As mentioned above the resultant polynomials have the following 
important property (cf. [4j for a proof): 



Theorem 1 The roots of and R^ are exactly the projections of the solutions of (2.1) onto thex- and 
y-coordinate, respectively. More precisely, V^ = {x G C\R^ y \x) = 0} and = {y G C\R^ x \y) = 0}. 
The multiplicity of a root a of R^ (R^ ) is the sum of the intersection multiplicitie^ of all solutions of 
(2.1) with x-coordinate (y-coordinate) a. 



3 Otherwise, / and g have to be decompos ed in to common and non-common factors (not part of our algorithm). 

4 The multiplicity of a solution (xo, yo) of (2.1l is defined as the dimension of the localization of C[x,y]/(f, g) at (xo,yo) 



considered as C-vector space (cf.[U p. 148]) 
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3.2 Isolating the Solutions: Project, Separate and Validate 

We start with the following high level description of the proposed algorithm which decomposes into three 



subroutines: In the first step (Project), we project the complex solutions Vc of (2.1) onto the x- and 



onto the y-axis. More precisely, we compute the restrictions := n M and := n M. 

(x) (v) 

of the complex projection sets and to the real axes and isolating intervals for their elements. 
Obviously, the real solutions are contained in the cross product C := V^ x) x V^ y) CR 2 . In the second 
step (Separate), we compute isolating discs which well separate the projected solutions from each other. 
The latter prepares the third step (Validate) in which candidates of C are either discarded or certified 



to be a solution of (2.1). Our main theoretical contribution is the introduction of a novel predicate to 



ensure that a certain candidate (a, 0) S CflFt actually fulfills f(a, (3) = g(a, f3) = (cf. Theorem |4|. For 



all candidates (a, (3) £ C\Vfo, simple interval arithmetic suffices to exclude (a, (3) as a solution of (2.1). 

We remark that, in order to increase the efficiency of our implementation, we also introduce additional 
filtering techniques to eliminate many of the candidates in C. However, for the sake of clarity, we refrain 
from integrating our filtering techniques in the following description of the three subroutines. Filtering 



techniques are covered separately in Section 4.2 Section 4.1 briefly discusses a highly parallel algorithm 
on the graphics hardware to accelerate computations of the resultants needed in the first step. 

Project: We compute the resultant R := = ves(f,g,y) € Z[x] and a square-free factorization 
of R. More precisely, we determine square- free and pairwise coprime factors r% £ Z[x], i = 1, . . . , deg(i?), 
such that R(x) = Hi=i ( r i( x )T- We remark that, for some i 6 {1, . . . , deg(i?)}, ri(x) = 1. Yun's 
algorithm [18[ Alg. 14.21] constructs such a square-free factorization by essentially computing greatest 
common divisors of R and its higher derivatives in an iterative way. Next, we isolate the real roots ajj, 
j = 1, . . . ,£{, of the polynomials rj. That is, we determine disjoint isolating intervals I(ajj) C M such that 
each interval I(a,ij) contains exactly one root (namely, ctjj) of rj and the union of all I(aij), j = 1, . . . , £i, 
covers all real roots of rj. For the real root isolation, we consider the Descartes method |XQ^ 130] as a 
suited algorithm. From the square- free factorization we know that ctij,j = 1, . . . ,1%, is a root of R with 
multiplicity i. 

Separate: We separate the real roots of R = W- y ' from all other (complex) roots of R, a step which 
is crucial for the final validation. More precisely, let a = aj ,j be the jo-th real root of the polynomial 
rig, where io G {1, . . . , deg(i?)} and jo 6 {1, . . . , £i } are arbitrary indices. We refine the corresponding 
isolating interval / = (a, b) := 1(a) such that the disc As ri (mj) does not contain any root of R™' except a. 
For the refinement of I, we use quadratic interval refinement [U [21] (QIR) which constitutes a highly 
efficient method because of its simple tests and the fact that it eventually achieves quadratic convergence. 

In order to test whether the disc Ag T . / (m/) isolates a from all other roots of R, we introduce a novel 
method based on the following test: 



T^(m,r) : \p(m)\ - if ^ 



k>l 



r k > 0, 



where p G M.[x] denotes an arbitrary polynomial and m, r, K arbitrary real values. Then, the following 
theorem holds (cf. Appendix [5] for a proof): 

Theorem 2 Consider a disk A = A m (r) C C with center m and radius r. 

1. If T^(m,r) holds for some K > 1, then the closure A of A contains no root of p. 

2. If Tft(m,r) holds for a K > \f 7 L, then A contains at most one root of p. 

Sin y 
now directly applies to the above scenario. More precisely, I is refined until T 3 ^° (mj, 8r/) 

and T^'(mj,8rf) holds for all i ^ iq. If the latter is fulfilled, Ag rj (m/) isolates a from all other roots of 
R. In this situation, we obtain a lower bound LB(a) for |-R(z)| on the boundary of A(a) := A2 rj (m/): 



4 



Lemma 1 The disc A(a) = A2 ri (mj) isolates a from all other (complex) roots of R and, for any z on 
the boundary dA(a) of A(a), it holds that \R(z)\ > LB (a) := 2- i °- de ^\R(m I - 2r/)|. 

Proof: A(a) is isolating as already Ag ri (mi) is isolating. Then, let (3 ^ a be an arbitrary root of R 
and d := \j3 — mj\ > 8rj the distance between (3 and to/. Then, for any point z G dA{a), it holds that 

\z—B\ d — 2rr 4rr 1 , \z — a\ rr 1 

> -; — — = 1 - -; — — > - and , -— — I r > — > 



\(mj - 2rj) - P\ d + 2ri (f + 2r 7 2 \(m I -2r I )-a\ 3rj 4' 

Hence, it follows that 

1^)1 > TT \ Z ~P\ . ( \ z ~ a \ V > 4 -io 2 -degOR)+*o 

\R{mi-2n)\ ^ a L R L w=0 \(mi-2rj)-P\ \ \ (mj - 2rj) - a\ J 

where each root (3 occurs as many times in the above product as its multiplicity as a root of R. I 

We evaluate LB(a) = 2^°~ dcg ^\R(mj — 2rj)\ and store the interval 1(a), the disc A(a) and the 
lower bound LB(a) for |-R(z)| on the boundary <9A(a) of A(a). 

Proceeding in exactly the same manner for each real root a of RS y \ we get an isolating interval 1(a), 
an isolating disc A(a) = A2 ri (mj) and a lower bound LB(a) for \R^\ on dA(a). For the resultant 
polynomial R( x \ Project and Separate are processed in exactly the same manner: We compute 
R( x ^ and a corresponding square- free factorization. Then, for each real root j3 of R^ x \ we compute a 
corresponding isolating interval I(/3), a disc A(j3) and a lower bound LB(j3) for \R^ X '\ on dA(/3). 

Validate; We start with the following theorem: 

Theorem 3 Let a and (3 be arbitrary real roots of R^ and R^ x \ respectively. Then, 



1. the polydisc A(a,(3) := A(a) x A((3) C C contains at most one (complex) solution of (2.1). If 



A (a, (3) contains a solution of (2.1), then this solution is real valued and equals (a, (3). 
2. For an arbitrary point (z\, Z2) G C 2 on the boundary of A(a, /3), it holds that 

> LB(a) ifzx e dA(a), and \R {x) (z 2 )\ > LB(/3) if z 2 G dA(p). 

Proof: (1) is an easy consequence from the construction of the discs A(a) and A(f3). Namely, if A(a, (3) 



contains two distinct solutions of (2.1), then they would differ in at least one variable. Thus, one of the 
discs A(a) or A((3) would contain two roots of R^ v ' or R^ x \ Since both discs are isolating for a root of 
the corresponding resultant polynomial, it follows that A(a, (3) contains at most one solution. In the case, 



where A(a,(3) contains a solution of (2.1), this solution must be real since, otherwise, A(a,f3) would also 
contain a corresponding complex conjugate solution (/ and g have real valued coefficients). (2) follows 
directly from the definition of A(a,(3), the definition of LB(a), LB(f3) and LemmajlJ I 



We denote B(a,(3) = 1(a) x I ((3) C M. 2 a candidate box for a real solution of ( |2.l[ ), where a and (3 
are real roots of R^ and R( x \ respectively. Due to Theorem [3J the corresponding "container polydisc" 
A(a,(3) C C 2 either contains no solution of (2.1) or (a, (3) is the only solution contained in A(a,(3). 



Hence, for each candidate pair (a, (3) G C, it suffices to show that either (a, (3) is no solution of (2.1) 
or the corresponding polydisc A(a,(3) contains at least one solution. In the following steps, we fix the 
polydiscs A(a, (3) whereas the boxes B(a, (3) are further refined (by further refining the isolating intervals 
1(a) and I(f3)). We also introduce exclusion and inclusion predicates such that, for sufficiently small 



B(a,(3), either (a, (3) can be discarded or certified as a solution of (2.1). 
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In order to exclude a candidate box, we use simple interval arithmetic. More precisely, we evaluate 
Of(B(a,f3)) and Og(B(a,/3)), where □/ and Dg constitute box functions for / and g, respectively: If 



either Of(B(a,P)) or U\g(B(a, /?)) does not contain zero, then (at,P) cannot be a solution of (2.1). Vice 
versa, if (a, (3) is not a solution and B(a,(3) becomes sufficiently small, then either ^ df(B(a, (3)) or 
^ Og(B(a, f3)) and our exclusion predicate applies. 

It remains to provide an inclu sion predicate, that is, a method to ensure that a certain candidate 
(a, j3) £ C is actually a solution of (2.1). We first rewrite the resultant polynomial RS V > as 



R {y) ( 



,(y) 



(x,y) ■ f{x,y) + v iy) (x,y) ■ g{x,y) 



where u^ v \ «W G Z[x, y]. Furthermore, and i;^ can be expressed as determinants of " Sylvester-like" 
matrices UM and V( y \ More precisely, and V® are obtained from S^ y ' (f, g) by replacing the last 
column with vectors (y n y~ 1 ... 1 . . . 0) T and (0 ... y m y~ l . . . 1) T of appropriate size, respectively |X9(, 
p. 287]. Both matrices have size (n y +m y ) x (n y +m y ) and univariate polynomials in x (the first n y + m y — 1 
columns) or powers of y (only the last column) or zeros as entries. We now aim for upper bounds 
for \u^\ and \v^\ on the polydisc A(a,(3). The polynomials and have huge coefficients and 
their computation, either via a signed remainder sequence or via determinant evaluation, is very costly. 
Hence, we directly derive such upper bounds from the corresponding matrix representations without 
computing vS y ' and Due to Hadamard's bound, \vW\ is smaller than the product of the 2-norms 
of the column vectors of U^ y \ The absolute value of each of the entries of can be easily upper 
bounded by using interval arithmetic on a box in C 2 that contains the polydisc A(a,/3). Hence, we get 
an upper bound on the 2— norm of each column vector and, thus, an upper bound UB(a, (3, u^ y ') for \u^\ 
on A(a,/3) by multiplying the bounds for the column vectors. In the same manner, we also derive an 
upper bound UB(a,p,vM) for | -yW| on A(a,P). With respect to our second projection direction, we 
write 

R (x) = u (x) . f + 

«W • g with corresponding polynomials u^ x \ £ Z[x,y]. In exactly the same 
manner as done for RW), we compute corresponding upper bounds UB(a,P,uW) and UB(a,/3,v^) for 
\u^\ and \v( x >\ on A(a, jS). 



Theorem 4 If there exists an (xq, yo) G A(a,/3) with 

UB(a, 0, U M) • \f(x , y )\ + UB(a, P, v®) ■ \g(x , y )\ < LB (a) and 
UB{a,pM x) ) ■ \f(x ,y )\ + UB(a,P,v^) ■ \g(x ,y )\ < LB(J3), 



(3-1) 
(3-2) 



then A(a,(3) contains a solution of (2.1) and, thus, f(a,P) = 0. 



Proof: The proof uses a homotopy argument. Namely, we consider the parameterized system 
f(x, y)-(l-t)- f(x , y ) = g(x, y) - (1 - t) ■ g(x , y ) = 0, 



(3-3) 



1, (3.3) is equivalent to our initial system (2.1). For 



where t is an arbitrary real value in [0,1]. For t 
i = 0, (3.3) has a solution in A(a, 0), namely, (xo,yo). The complex solutions of (3.3) continuously depend 
on the parameter t. Hence, there exists a "solution path" V : [0, 1] i— > C 2 which connects T(0) = (xo,yo) 
with a solution T(l) £ C 2 of (2.1). We show that T(t) does not leave the polydisc A(a,/3) and, thus, 



(2.1) has a solution in A(a,P): Assume that the path T(t) leaves the polydisc, then there exists a 
t' £ [0, 1] with (x',y') = T(t') £ dA(a,P). We assume that x' £ dA(a) (the case y' £ dA(P) is treated 



in analogous manner). Since (x',y') is a solution of (3.3) for t = t' , we must have \ f(x',y')\ < \f(xo,yo) 
and \g(x',y')\ < |^(a;o,yo)|. Hence, it follows that 



\R (y) {x')\ 



{x ,y')f{x ,y) +v i:y) (x ,y')g{x ,y')\ < \u {y} (x ,y')\ • \ f(x',y')\ + \v [y) (x', y')\ • \g(x ,y')\ 



(y), 



< UB(a,p,uM) • \f(x ,y )\ + UB(a,p,v™) • \g(x ,y )\ < LB (a). 
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This contradicts the fact that \R^ y \x')\ is lower bounded by LB(a). It follows that A(a,/3) contains a 
solution of (2.1) and, according to Theorem [3J this solution must be (a,/3). I 



Theorem [4] now directly applies as an inclusion predicate. Namely, in each refinement of B(a,f3), we 
choose an arbitrary (xq,uo) G B(a,f3) (e.g., the center (rrij^^nij^-j)) of the candidate box B(a,/3)) and 
check whether both inequalities (3.1) and (3.2) are fulfilled. If (a,/3) is a solution of (2.1), then both 



inequalities eventually hold and, thus, we have shown that (a, (3) is a solution. 

We remark that the upper bounds UB(a,/3,u^), UB(a,f3,v^), UB(a,P,vl x) ) and UB(a,/3,v^) are 
far from being optimal. Nevertheless, our inclusion predicate is still efficient since we can approximate the 
potential solution (a, (3) with quadratic convergence due to QIR. Hence, the values f(xo, yo) and g(xo, yo) 
become very small after a few iterations. In order to improve the above bounds, we propose to consider 
more sophisticated methods from numerical analysis and matrix perturbation theory \22\ I31j . Finally, 
we would like to emphasize that our method applies particularly well to the situation where we are only 
interested in the solutions of (2.1 ) within a given box B = [A, B] x [C, D] C M 2 . Namely, in Project, we 



only have to search for roots within the interval [A, B] ([C, D]) for 1?W (R^) and only candidate boxes 
within B have to be considered in Seperate and Validate. 



4 Speedups 

4.1 Resultants on graphics hardware 

Computing the resultants of bivariate polynomials is an important "symbolic part" of our algorithm. 
Despite a large body of research existing on this subject, symbolic computations still constitute a large 
bottleneck in many algorithms and substantially limit their range of applicability. We use a novel approach 
exploiting the power of GPUs to dramatically reduce the time for computing resultants. In this section, 
we briefly discuss the algorithm; we refer the reader to [17} IT6] for details. 

Our approach is based on the classical "divide-conquer-combine" modular algorithm by Collins [9] . The 
algorithm can be summarized in the following steps. 1. Apply modular and evaluation homomorphisms 
to map the problem to computing a large set of problems over a simple domain. 2. Compute a set of 
univariate resultants over a prime field. 3. Recover the resultant through polynomial interpolation and 
Chinese remaindering. 

Unfortunately, Collins' algorithm in its original form is not suitable for a realization on the GPU. This 
is because the amount of parallelism exposed by the modular approach is far too low to satisfy the needs 
of massively-threaded architectures. We deal with this issue by reducing the problem to computations 
with structured matrices because matrix operations typically map very well to the GPU's threading 
model. As a result, all steps of the algorithm except the initial modular reduction and partly the Chinese 
remaindering are run on the graphics hardware, thereby minimizing the amount of work to be done on the 
CPU. For expository purposes, we outline here the computation of univariate resultants in more detail. 

Suppose, / and g are polynomials in 7L\x\ of degrees m and n respectively. It is clear that the resultant 



of / and g reduces to the triangular factorization of the Sylvester matrix S (see Section 3.1 ). The matrix 
S G 27 xr (r = m + n) is structured as it satisfies the displacement equation [23j : 



S - Z r SA T = GB T , with A = Z m Z n and G, B G Z rx2 , 

here Z s G Z sxs is a down-shift matrix zeroed everywhere except for l's on the first subdiagonal. Accord- 
ingly, the generators G, B are matrices whose entries can be deduced from the matrix S by inspection. 
Hence, we can apply the generalized Schur algorithm which operates on the matrix generators to compute 
the matrix factorization in C(r 2 ) time, see |23t p. 323]. 

In short, the Schur algorithm is an iterative procedure: In each step, it brings the matrix generators to 
a "special form" from which triangular factors can easily be deduced based on the displacement equation. 
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Figure 4.1: (a) Intervals containing the roots of f(a,y) and g(a,y) are refined until they either do not overlap or 
are fully included in candidate boxes. In the former case, the boxes can be discarded, (b) Unvalidated candidates 
are passed to bidirectional filter which runs bitstream isolation in another direcion 



Using division-free modifications this procedure can be efficiently performed in a finite field giving rise to 
the factorization algorithm running in 0(r) time using r processors. 



Suppose that we have evaluated the polynomials /,j £ 7\x,y] as defined in (2.1) at a number of 

— (p) 

points Xi G Z p and computed a set of univariate resultants over a prime field Z p , that is, z- = 
res(f(xi,y),g(xi,y),y) G 7L V . Then, the resultant polynomial R^ v \x) is interpolated over the prime 
field Z p and eventually lifted to an integer solution via Chinese remaindering. We remark that polyno- 
mial interpolation corresponds to solving the Vandermonde systemj^] Again, exploiting the structure of 
Vandermonde matrix we can use the Schur algorithm to solve the system in a small parallel time. 



4.2 Filters 

Besides the parallel resultant computation, our algorithm elaborates some filtering techniques to early 
validate a majority of the candidates. 

As first step, we group candidates along the same vertical line (a fiber) at an x-coordinate a (a root of 
R^) to process them together. This allows us to use extra information on the real roots of f(a, y) G M.[y] 
and g(a, y) G for candidate validation. We replace the tests based on interval evaluation (see page [5]) 
by a test based on the bitstream Descartes isolator [15] (Bdc for short). This method allows us to isolate 
the real roots of a polynomial with "bitstream" coefficients, that is, coefficients that can be approximated 
to arbitrary precision. Bdc starts from an interval guaranteed to contain all real roots of a polynomial, 
and proceeds with interval subdivisions giving rise to a subdivision tree. Accordingly, the approximation 
precision for coefficients is increased in each step of the algorithm. Each leaf of the tree is associated with 
an interval and stores an upper and a lower bound on the number of real roots within this interval based 
on Descartes' Rule of Signs. An interval is not further subdivided when both bounds equal 0, where the 
interval is discarded, or 1, where we have found an isolating interval. Isolating intervals can be refined 
to arbitrary precision. We remark that Bdc terminates if all real roots are simple. Otherwise, intervals 
which contain a multiple root are further refined but never certified to contain a root. 

In our algorithm, we apply Bdc to the polynomials f(a,y) and g(a,y). Eventually, intervals that do 
not share a common root of both polynomials will be discarded. This property is essential for our "filtered" 
algorithm: a candidate box B(a, (3) can be rejected as soon as the associated y-interval I{j3) does not 



overlap with at least one of the isolating intervals associated with f(a,y) or g(a,y); see Figure 4.1 (a). 

Grouping candidates along a fiber x = a also enables us to use combinatorial tests to discard or to 
certify them. First, when the number of certified solutions reaches mult (a), the remaining candidates 
are automatically discarded because each real solution contributes at least once to a's multiplicity as a 



5 Here we are not concerned with the fact that Vandemonde systems are notoriously ill-conditioned since all operations 
are performed in a finite field. 



s 



Table 1: Description of the curves used in the first part of experiments. In case only a single curve given, the second 
curve is taken to be the first derivative w.r.t. y-variable. 



Instance 


Description 


Instance 


Description 


L4_circles 


circles w.r.t. L4-norm, clustered solutions 


SA_4_4_ep{_ 




singular points with high tangencies, displaced 


curvejssac 


a curve appeared in [8] 


FTT_5_4_' 




many non-rational singularities 


tryme 


covertical solutions, many candidates to check 


dfold_10J 






a curve with many half-branches 


large_curves 


large number of solutions 


cov_sol_20 


covertical solutions 


degree_6_surf 


silhouette of an algebraic surface, covertical so- 
lutions in both directions 


mignote_xy 


a product of x/y- Mignotte polynomials, dis- 
placed; many clustered solutions 


j™^ — 

challenge.!!: 


many candidate solutions to be checked 


spider 


degenerate curve, many clustered solutions 



* These curves were taken from 



root of (cf. Theorem [l]). Second, if mult(a) is odd and all except one candidate along the fiber are 
discarded, then the remaining candidate must be a real solution. This is because complex roots come in 
conjugate pairs and, thus, do not change the parity of mult (a). We remark that, in case where the system 
(2.1) is in generic position and the multiplicities of all roots of R are odd, the combinatorial test already 
suffices to certify all solutions without the need to apply our inclusion predicate. 

Now, suppose that, after the combinatorial test, there are several candidates left along a fiber. For 
instance, the latter can indicate the presence of covertical solutions. In this case, before using the inclusion 
predicate, we can apply the aforementioned filters in horizontal direction. More precisely, we construct 
the lists of unvalidated candidates sharing the same y-coordinate f3 and process them along a horizontal 
fiber. For this step, we initialize the bitstream trees for f(x, /3) and g(x, j3) and proceed in exactly the 
same way as done for vertical fibers; see Figure 4.1 (b). Candidates that still remain undecided after 
all tests are processed by considering our inclusion predicate. In Section [5j where we next examine the 
efficiency of our filters, we will refer to this procedure as the bidirectional filter. 



5 Implementation Sz. Experiments 

We have implemented our algorithm as a prototypical package of Cgal|^] As throughout the library 
we follow a generic programming paradigm that, for instance, enables us to easily exchange the number 
types used or the method to isolate the roots of a polynomial without altering the main structure of the 
implementation. 

In our experiments, we have used the number types provided by Gmp 4.3.1 and fast polynomial GCD 
from Ntl 5.5 library^] All experiments have been run on 2.8GHz 8-Core Intel Xeon W3530 with 8 MB of 
L2 cache under Linux platform. For the GPU-part of the algorithm, we have used the GeForce GTX480 
graphics processor (Fermi Core). We compared our approach to the bivariate version of Isolate (based 
on Rs by Fabrice RouillieiJ^]) and Lgp by Xiao-Shan Gao et alj^] We remark that, for the important 
substep of isolating the real roots of the elimination polynomial, all three contestants (including our 
implementation) use the highly efficient implementation provided by Rs. 

Our tests consist of two parts: In the first part, we consider "special" curves (and their derivative w.r.t. 
y-variable) selected in the aim of challenging different parts of the algorithm and showing the efficiency 



of the filtering techniques given in Section 4.2 These curves, for instance, have many singularities or 
high-curvature points which requires many candidates to be tested along each vertical line, or prohibit 
the use of special filters. Descriptions of the considered curves and corresponding timings are listed in 
Table [T] and Table [2j respectively. In the second part of our experiments, we study the performance of 



3 Computational Geometry Algorithms Library, www.cgal.org 



7 Gmp: http://gmplib.org, Ntl: http://www.shoup.net/ntl 

8 RS: http : //www. loria.fr/equipes/vegas/rs 

9 The software is available at http://www.mmrc.iss.ac.cn/~xgao/software.html 
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Table 2: Experiments for the curves listed in Table [l] Execution times are in seconds, including resultant com- 
putations. BlSOLVE-GPU: our approach with GPU-resultants; Bisolve-CPU: our approach with Cgal's CPU- 
resultants; Isolate and Lgp use Maple's implementation for the resultant computation. Bold face indicates 
default setup for Bisolve. 



Instance 


y-degree 


#sols 


BS+AL 

CPU 


FILTERS 

GPU 


BS+BSTR+COMB BS+BSTR BS 

GPU 


Isolate 
Maple 


Lgp 
Maple 


L4_circles 


16 


17 


2.74 


1.68 


1.52 


1.71 


0.68 


1.20 


7.40 


curveJssac 


15 


18 


4.30 


3.21 


2.70 


3.47 


1.84 


70.91 


3.54 


tryme 


24, 34 


20 


98.56 


29.31 


31.63 


89.83 


89.86 


167.81 


176.86 


large_curves 


24, 19 


137 


110.20 


91.15 


90.82 


376.71 


377.55 


501.52 


138.35 


degreeJLsurf 


42 


13 


149.33 


17.46 


16.50 


18.63 


62.18 


timeout 


133.77 


challenge_12 


40 


99 


108.74 


23.07 


27.66 


27.20 


195.76 


41.13 


40.86 


SA_4_4.eps 


33 


2 


155.92 


2.83 


2.85 


3.81 


8.57 


296.02 


56.30 


FTT_5_4_4 


40 


62 


73.89 


17.58 


20.92 


20.99 


111.22 


timeout 


199.92 


dfold_10_6 


32 


21 


26.20 


4.80 


3.12 


3.19 


3.54 


3.14 


3.84 


cov_sol_20 


20 


8 


27.25 


12.36 


36.10 


42.81 


52.16 


762.80 


175.85 


mignotte_xy 


32 


30 


545.88 


438.38 


440.64 


986.68 


1021.50 


timeout 


timeout 


spider 


28 


38 


389.06 


81.63 


87.44 


135.15 


314.56 


timeout 


timeout 



timeout: algorithm timed out (> 1500 sec) 



the Bisolve on random polynomials with increasing total degrees and coefficient bit-lengths. We refer 
the reader to Table [3] for the corresponding timings. Appendix [B] features further experiments. 

In columns 4-8, the experiments for our algorithm are given with all filters set on (BS+allfilters), 
with bitstream and combinatorial filter (BS+bstr-|-COMb), with bitstream filter only (BS+bstr) and 
with all filters set off (BS). For Bisolve, we report timings respectively with and without GPU resultant 
algorithm. For the remaining configurations we show only the timings using GPU resultants. CPU-based 
timings can easly be obtained by taking the difference between BisOLVE-columns. 

One can observe that our algorithm is generally superior to Isolate and Lgp even if the filters are not 
used. By comparing columns 5-8 in the table, one can see that filtering sometimes results in a significant 
performance improvement. The combinatorial test is particularly useful when the defining polynomials of 
the system (2.1) have large degrees and/or large coefficient bit-length while at the same time the number 
of covertical or singular solutions is small compared to the total number of candidates being checked. The 
bidirectional filter is advantageous when the system has covertical solutions in one direction (say along 
y-axis) which are not cohorizontal. This is essentially the case for challenge_12, cov_sol_20 and spider. 

Another strength of our approach relates to the fact that the amount of symbolic operations is crucially 
reduced. Hence, when the time for computing resultants is dominating, the GPU-based algorithm offers 
a speed-up by the factor of 2-5 over the version with default resultant implementation. It is also worth 
mentioning that both Isolate and Lgp benefit from the fast resultant computation available in Maple 13 
while Cgal's default resultant computatiorj^] is generally much slower than that of Maple. As a result, 
there is a large discrepancy columns 4 and 5 for Bisolve. 

Table [3] lists timings for experiments with random curves. Each instance consists of five curves of 
the same degree (9 or 15, dense or sparse) and we report the average time to compute the solutions 
for one of all ten pairs of curves. In order to analyze the influence of the coefficients' bit-lengths, we 
multiplied each curve by 2 k with k G {128, 512, 2048} and increased the constant coefficient by one. Since 
the latter operation constitutes only a small perturbation of the vanishing set of the input system, the 
number of solutions remains constant while the content of the polynomials' coefficients also stays trivial. 
We see that the bidirectional filtering is not of any advantage because the system defined by random 
polynomials is unlikely to have covertical solutions. However, in this case, most candidates are rejected 
by the combinatorial check, thereby omitting (a more expensive) test based on Theorem |4j This results 

10 Authors are indebted to Cgal developers working on resultants. 
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Table 3: Averaged running times for 10 pairs of curves defined by random polynomials of degree 9 and 15 with 
increasing bit-lengths (given by shift parameter). For description of configurations, see Table [2| 



Density of 
polynomials 


y-degree 


shift 


avg. 
#sols 


BS+AL 

CPU 


^FILTERS 

GPU 


BS+BSTR+COMB BS + BSTR 

GPU 


BS 


Isolate 
Maple 


Lgp 
Maple 


dense 


9,9 


128 
512 
2048 


5.6 


0.30 
0.48 
2.22 
16.47 


0.21 
0.15 
0.31 
2.07 


0.22 
0.16 
0.31 
2.06 


0.21 
0.15 
0.31 
2.07 


0.19 
0.49 
2.03 
13.86 


0.33 
0.66 
1.51 
7.48 


0.24 
0.93 
3.33 
102.37 


dense 


15,15 


128 
512 
2048 


5.0 


1.82 
6.02 
32.18 
251.07 


0.71 
0.69 
1.48 
8.97 


0.70 
0.67 
1.45 
8.94 


0.71 
0.67 
1.48 
8.97 


1.64 
3.74 
14.35 
94.09 


6.85 
14.66 
38.27 
141.69 


3.88 
8.31 
22.36 
102.36 


sparse 


9,9 


128 
512 
2048 


4.5 


0.10 
0.14 
0.46 
3.11 


0.07 
0.08 
0.16 
0.84 


0.07 
0.08 
0.15 
0.84 


0.07 
0.07 
0.15 
0.84 


0.09 
0.21 
0.85 
5.86 


0.07 
0.20 
0.70 
5.40 


0.22 
0.57 
1.74 
7.38 


sparse 


15,15 


128 
512 
2048 


3.8 


0.65 
1.55 
7.70 
58.97 


0.36 
0.46 
1.55 
13.45 


0.36 
0.47 
1.55 
13.38 


0.36 
0.46 
1.55 
13.46 


0.66 
1.50 
6.80 
51.14 


0.99 
4.68 
16.01 
132.76 


1.24 
3.51 
11.93 
74.03 



in a clear speed-up over a "non-filtered" version. Also, observe that GPU-Bisolve is not vulnerable to 
increasing the bit-length of coefficients while this becomes critical for Isolate's and Lgp's performance. 
We have also observed that, for our filtered versions, the time for the validation step is almost independent 
of the bit-lengths. 

We omit experiments to refine the solution boxes to certain precision as this matches the efficiency of 
QIR due to the fact that we have algebraic descriptions for solutions' x- and y-coordinates. 

6 Summary and Outlook 

We propose an exact and complete method to isolate the real solutions of a bivariate polynomial system. 
Our algorithm is designed to reduce the number of purely symbolic operations as much as possible. 
Eventually, only resultant computation and square-free factorization of the resultant polynomial are still 
needed. By transferring the resultant computation to the GPU, we are able to remove a major bottleneck 
of elimination approaches. In order to further improve our implementation, we aim to outsource the 
square-free factorization to the GPU as well, a step which seems to be feasible since factorization is also 
well suited for a " divide-conquer-combine" modular approach. Since our initial motivation was to speed 
up the topology and arrangement computation for algebraic curves and surfaces, we plan to extend our 
method towards this direction. Furthermore, it would be interesting to extend our algorithm to handle 
higher dimensional systems or complex solutions. Finally, we would like to investigate in hybrid methods 
such as the combination of a numerical complex root solver and an exact post certification method serving 
as an additional filter in the validation step (in the spirit of |344 I32j). We are convinced that most of 
the candidate boxes could be treated even more efficiently by the use of such methods. We claim that, 
eventually, the total costs for solving a bivariate system should only be dominated by those of the root 
isolation step for the elimination polynomial. For many instances, our experiments already hint to the 
latter claim. We aim to further improve our implementation to show this behavior for all instances and 
to provide a proof in terms of complexity as well. 
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A Proofs 



Theorem 5 Consider a disk A = A m (r) C C with center m and radius r. 

1. If T^(m,r) holds for some K > 1, then the closure A of A contains no root of p. 

2. If T^(m,r) holds for a K > y/2, then A contains at most one root of p. 
Proof: (1) follows from a straight forward computation: For each z 6 A, we have 

p{z) = p(m + (z — m)) = p(m) + — — — (z — m) k 

k>\ k - 



and, thus, 



\p(m)\ \p(m)\ ^ k\ \ K 



since \z — m\ < r and T^(m,r) holds. In particular, for K > 1, the above inequality implies \p(z)\ > 
and, thus, p has no root in A. 

It remains to show (2): If T^(m, r) holds, then, for any point z € A, the derivative p'{z) differs from 
p'(m) by a complex number of absolute value less than \p'(m)\/K. Consider the triangle spanned by the 
points 0, p'{m) and p'(z), and let a and f3 denote the angles at the points and p'(z), respectively. From 
the Sine Theorem, it follows that 

\sma\ = \p'(m)-p'(z)\.^ ] <^. 

Thus, the arguments of p'(m) and p'(z) differ by less than arcsin(l/ET) which is smaller than or equal to 
7r/4 for K > Assume that there exist two roots a, b G A of p. Since a = b implies p'(a) = 0, which 
is not possible as Tf (m, r) holds, we can assume that a ^ b. We split / into its real and imaginary part, 
that is, we consider p(x + iy) = u(x,y) + iv(x,y) where u,v : R 2 ->• R are two bivariate polynomials. 
Then, p(a) = p(b) = and so u(a) = v(a) = u{b) = v(b) = 0. But u(a) = u(b) = implies, due to the 
Mean Value Theorem in several real variables [], that there exists a 4> G [a, b] such that 

Vu((p) _L (b -a). 

Similarly, v(a) = v{b) = implies that there exists a ^ € [a,b] such that Vv(£) _L (b — a). But Vt>(£) = 
(^x(O) v y(0) = ( — u y(0i u x(0)j thus, it follows that Vw(£) || (6 — a). Therefore, Vu(ip) and V«(£) must 
be perpendicular. Since p' = u x + iv x = u x — iu y , the arguments of p'(^) and p'{£,) must differ by ir/2. 
This contradicts our above result that both differ from the argument of p'(m) by less than 7r/4, thus, (2) 
follows. I 
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B Further Experiments 



Table 4: Description of the curves used in experiments. In case only a single curve is given, the second curve is 
taken to be the first derivative w.r.t. y-variable. 



Instance 


Description 


Instance 


Description 


harcLone 


vertical lines as component of one curve, many 
candidates to test 


compact_surf 


silhouette of an algebraic surface, many singu- 
larities, isolated solutions 


grid_deg_10 


large coefficients, curve in generic position 


13_sings_9 


large coefficients, high-curvature points 


huge_cusp 


large coefficients, high-curvature points 


swinnerston_dyer 


covertical solutions in both directions 


cusps_and_flexes 


high-curvature points 


PH — 

challenge_12 J 


many candidate solutions to be checked 


L6_circles 


4 circles w.r.t. L6-norm, clustered solutions 


SA_2_4_ep< T 


singular points with high tangencies, displaced 


ten_circles 


set of 10 random circles multiplied together, 
rational solutions 


spiral29_24 


taylor expansion of a spiral intersecting a curve 
with many branches, many candidates to check 


curve24 


curvature of degree 8 curve, many singularities 







* These curves were taken from [26] 



Table 5: Results for the curves listed in Table 4 We used the same configurations as in Table 2 



Instance 


y-degree 


#sols 


BS+A 
CPU 


^LFILTERS 

GPU 


BS+BSTR+COMB BS+BSTR 

GPU 


BS 


Isolate 
Maple 


Lgp 
Maple 


hard_one 


27, 6 


46 


8.17 


6.95 


6.96 


12.44 


12.09 


25.20 


20.00 


grid_deg_10 


10 


20 


4.05 


1.63 


1.64 


3.22 


3.01 


106.95 


3.16 


huge_cusp 


8 


24 


33.24 


21.43 


21.15 


26.97 


26.47 


768.56 


119.03 


cusps_and .flexes 


9 


20 


2.31 


1.37 


1.28 


1.70 


1.38 


28.42 


2.73 


L6_circles 


24 


18 


25.00 


6.21 


5.68 


6.88 


5.08 


46.61 


52.79 


curve24 


24 


28 


41.26 


16.61 


16.66 


118.91 


115.62 


49.69 


41.96 


ten_circles 


20 


45 


10.51 


7.64 


4.63 


4.93 


2.57 


5.22 


5.24 


compact _surf 


18 


57 


19.01 


6.53 


5.98 


5.85 


23.56 


timeout 


12.31 


13_sings_9 


9 


35 


3.38 


2.39 


2.23 


2.98 


2.41 


28.60 


2.97 


swinnerston_dyer 


40 


63 


50.32 


22.60 


22.53 


22.33 


56.46 


71.00 


28.47 


challenge_12_l 


30 


99 


24.67 


9.17 


9.89 


9.44 


41.84 


41.13 


40.86 


SA_2_4.eps 


17 


6 


6.71 


0.56 


0.59 


0.70 


1.66 


7.83 


4.90 


spiral29_24 


29, 24 


51 


80.37 


35.34 


35.13 


290.35 


286.79 


144.79 


84.97 



timeout: algorithm timed out (> 1500 sec) 
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